Text Classification Using Stochastic Keyword Generation
نویسندگان
چکیده
This paper considers improving the performance of text classification, when summaries of the texts, as well as the texts themselves, are available during learning. Summaries can be more accurately classified than texts, so the question is how to effectively use the summaries in learning. This paper proposes a new method for addressing the problem, using a technique referred to as ’stochastic keyword generation’ (SKG). In the proposed method, the SKG model is trained using the texts and their associated summaries. In classification, a text is first mapped, with SKG, into a vector of probability values, each of which corresponds to a keyword. Text classification is then conducted on the mapped vector. This method has been applied to email classification for an automated help desk. Experimental results indicate that the proposed method based on SKG significantly outperforms other methods.
منابع مشابه
Text Classification using Language-independent Pre-processing
A number of language-independent text pre-processing techniques, to support multi-class single-label text classification, are described and compared. A simple but effective statistical keyword identification approach is proposed, coupled with a number of phrase identification mechanisms. Experimental results are presented.
متن کاملStatistical Identification of Key Phrases for Text Classification
Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyw...
متن کاملAutomatic Content-Based Categorization of Wikipedia Articles
Wikipedia’s article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse – using text classification methods for predicting the categories of Wikipedia articles – has attracted less attention so far. We propose to “return the favor” and use text classifiers to improve Wik...
متن کاملShort Text Feature Enrichment Using Link Analysis on Topic-Keyword Graph
In this paper, we propose a novel feature enrichment method for short text classification based on the link analysis on topic-keyword graph. After topic modeling, we re-rank the keywords distribution extracted by biterm topic model (BTM) to make the topics more salient. Then a topic-keyword graph is constructed and link analysis is conducted. For complement, the K-L divergence is integrated wit...
متن کاملKeyword Reduction for Text Categorization using Neighborhood Rough Sets
Keyword reduction is a technique that removes some less important keywords from the original dataset. Its aim is to decrease the training time of a learning machine and improve the performance of text categorization. Some researchers applied rough sets, which is a popular computational intelligent tool, to reduce keywords. However, classical rough sets model, which is usually adopted, can just ...
متن کامل